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DETAILED ACTION 
Response to Amendment 

1. Applicant's arguments filed 01/05/2006 regarding Office Action of 10/05/2005, 

the proposed changes are approved by the examiner; amended claims 1,11, 18, and 34-35; and 
previously presented claims 27, 29, and 31; and canceled claims 7. 

Response to Arguments 

2. The applicant's arguments have been fully considered by they are not persuasive for 
the following reasons: 

Rejection of claims.1-6 and. 8-36 

Applicant argues that Brown (5,719,997) merely instantiates selected portions of the 
grammar over time; Brown's instantiation is allocation of memory and implements top level • 
grammar in Fig. 5 (such as SIZE) and implements sub-grammar in Fig. 8, (such as "LARGE" 
"MEDIUM" "SMALL"), moreover, Brown's phoneme is demonstrated in Fig. 1 element 120, 
thus Brown's instantiation is allocation of memory via hierarchical grammar structure: top level 
grammar, sub-grammar, and phoneme, col. 4 lines 14-23. 

Applicant argues that Brown does not explicitly teach data structures by a communication 
channel by a remote computer, or that the set of data structures is generated by the speech 
recognition system using information provided at least in part by a remote computer. Examiner 
respectfully disagrees. Brown discloses memory allocation via instantiating of grammar and 
Ehsani et al (2002/0032564) disclose the recognition "grammar", in which states are 
implemented as data structure. Ehsani describes the recognition "grammar", which uses 
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"phonetic" transcription, "word" sequences, and probability (states) to process the voice 
commands (Page 11, column 0212). Ehsani further discloses the remote access via a 
communication channel via a voice telephony server with speech recognition for remote access 
of databases via voice commands (page 1 1, paragraph 0200) in order to extend the capability to 
access external data bases or control applications or devices, as taught by Ehsani (paragraph 
0200). In this regard, the examiner respectfully maintains the rejection of claims 1, 2, 8-26, 28, 
30 and 32-35. 

Applicant argues that Brown does not teach data structures by a communication channel 
by a remote computer, or that the set of data structures is generated by the speech recognition 
system using information provided at least in part by a remote computer. Examiner points to 
Brown in view of Ehsani et al (2002/0032564). Ehsani teaches voice telephony server with 
speech recognition for remote access of databases via voice commands (page 11, paragraph 
0200) in order to extend the capability to access external data bases or control applications or 
devices, as taught by Ehsani (paragraph 0200). Ehsani further teaches (Uses an application of 
recognition "grammars" via "remote" voice control (Page 1 1, column 0200)... Grammar such as 
word, phone, and states are used in data structure. Ehsani describes the recognition "grammar", 
which uses states that are implemented in a data structure. Ehsani describes the recognition 
"grammar", which uses "phonetic" transcription, "word" sequences, and probability (states) to 
process the voice commands (Page 11, column 0212). 

Applicant argues that Brown (5,719,997) merely instantiates selected portions of the 
grammar over time; Brown's instantiation is allocation of memory and implements top level 
grammar in Fig. 5 (such as SIZE) and implements sub-grammar in Fig. 8, (such as "LARGE" 
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"MEDIUM" "SMALL"), moreover, Brown's phoneme is demonstrated in Fig. 1 element 120, 
thus Brown's instantiation is allocation of memory (col. 4 lines 14-23) via top level grammar, 
sub-grammar, and phoneme. 

Applicant argues that Brown does not teach data structures by a communication channel 
by a remote computer, or that the set of data structures is generated by the speech recognition 
system using information provided at least in part by a remote computer. Examiner points to 
Brown in view of Ehsani et al (2002/0032564). Ehsani teaches voice telephony server with 
speech recognition for remote access of databases via voice commands (page 1 1, paragraph 
0200) in order to extend the capability to access external data bases or control applications or 
devices, as taught by Ehsani (paragraph 0200). Ehsani further teaches (Uses an application of 
recognition "grammars" via "remote" voice control (Page 11, column 0200)... Grammar such as 
word, phone, and states are used in data structure. Ehsani describes the recognition "grammar", 
which uses states that are implemented in a data structure. Ehsani describes the recognition 
"grammar", which uses "phonetic" transcription, "word" sequences, and probability (states) to 
process the voice commands (Page 1 1, column 0212). 

Applicant argues that Ehsani (2002/0032564) fails to suggest the novel invention. The 
simple argument stating that Brown and Ehsani (either singly or in any combination in any 
permissible combination) fails to disclose the novel invention is not grounds for traversing the 
rejection. Both prior art, Brown and Ehsani, teach the claimed limitation, which in combination, 
would have been obvious to one of ordinary skill in the art, thus one would have been motivated 
to combine Ehsani's disclosed phrase recognition voice control with Brown's large vocabulary 
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speech recognition system to implement a speech recognition system for the purpose of enabling 
users to have greater access to information by using a remote computer. 

Applicant argued that grammar used in processing speech signals are not provided by a 
remote computer. Examiner respectfully disagrees. Ehsani implements grammar used to process 
the speech signal provided by a remote computer or server via a voice telephony server with . 
speech recognition for remote access of databases via voice commands (page 1 1, paragraph 
0200). In this regard, the examiner respectfully maintains the rejection of claims 3-7, 16-17 and 
36. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S. C. 103(a) which forms the basis for all obviousness 
rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

4. Claims 1-6, 8-36 are rejected under 35 U.S.C. 103(a) as being unpatentable over Brown et al. 
(5,719,997) in view of Ehsani et al. (2002/0032564). 

As for claim 1, Brown teaches a method for allocating memory in a speech recognition 
system comprising the steps of: 

acquiring a first set of data structures that contain a grammar, 
a word sub-grammar, a phone sub-grammar and a state sub-grammar, each of the 
sub-grammars related to the grammar (Fig 1, col. 3, lines 41-42); 

acquiring a speech signal (speech input, column 1, lines 26-28); 



5 



Application/Control Number: 09/894,898 Page 6 

Art Unit: 2654 

performing a probabilistic search using the speech signal as an input, and using the first 
set of data structures as possible inputs (". . .mixture probability processor. . .grammar processor" 
column 1 3 lines 39-40); 

and allocating memory for one of the sub-grammars when a transition to that sub- 
grammar is made during the probabilistic search (". . .evolutional grammar" instantiated when 
needed "column 8, lines 8-18, lines 11-23 and column 2, lines 16-18; "de-instantiated..." 
column 2, lines 23-25"). 

Brown et al. do not explicitly teach implementing a remote computer. 
However, Ehsani et al. do teach wherein the first set of data structures is generated by the 
speech recognition system based at least in part in part on a grammar provided by a remote 
computer (Voice telephony server with speech recognition for remote access of databases via* 
voice commands page 11, paragraph 0200). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Brown et al.'s data structure into Ehsani' s remote 
appliance/computer because this would provide users with flexibility, thus users have the 
capability to access external data bases or control applications or devices, as taught by Ehsani 
(page 1 1 paragraph 0200 and page 13 paragraph 0230-0231). 

As to claim 2, which depends on claim 1, Brown et al. teach 

that the probabilistic search is a Viterbi beam search (column 1, lines 41-42). 
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As to Claim 3, which depends on claim 1, Brown et al. do not explicitly teach sending 
data structures through a communication channel by a remote computer. 

However, Ehsani do teach that the set of data structures for a voice-user interface is sent 
through a communication channel by a remote computer and that the set of data structures is 
generated by the speech recognition system using information provided at least in part by a 
remote computer (Voice telephony server with speech recognition for remote access of databases 
via voice commands (page 1 1, paragraph 0200). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to use Brown et al.'s data structure into Ehsani' s remote appliance/computer 
because this would optimize both the design and performance of speech applications by 
generating means via a remote control appliance or computer desktop application, as taught by 
Ehsani (page 1 1 paragraph 0200-0202). 

As to Claim 4, which depends on claim3, Brown et al. do not explicitly teach web page. 

However, Ehsani do teach a set of data structures included in code that defines a web 
page and data structures associated with one or more web pages ("voice page(s)" or "codes" 
is/are represented by data (data structure) for both structure and content of the Web page, and 
"enables interaction with the Web page using audio input from speech" page 13, paragraph 
0231). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to use Brown et al.'s data structure into Ehsani' s voice access of web 
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page(s) because this would provide the user with interaction with Web page using audio input 
speech or tones(s). (Ehsani, page 1 1 paragraph 0200-0202). 

As to Claim 5 ,which depends on claim3, Brown et al. do not explicitly teach one web 

page. 

However, Ehsani do teach a set of data structures included in code that defines a web 
page and data structures associated with one or more web pages ("voice page(s)" or "codes" 
is/are represented by data (data structure) for both structure and content of the Web page, and 
"enables interaction with the Web page using audio input from speech" page 13, paragraph 
(0231; data structures are "voice page" instructions for a conventional Web page and includes 
speech application in Fig. 6). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to use Brown et al.'s data structure into Ehsani' s voice access of web 
page(s) because this would enable the user access to "voice pages", thus the user can use their 
voice rather than filling out interactive forms on the Web using a keyboard or mouse, as taught 
by Ehsani, page 13 paragraph 0230-0231. 

As to Claim 6, which depends on claim 1, Brown et al. do not explicitly teach data 
structure selection via a remote computer. 

However, Ehsani do teach a set of data structures (voice-interface application database 
located in telephone server) is selected by a remote computer (internet) (telephone input 
controls Internet applications, voice server, linked over the Internet, page 1 1 paragraph 200). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to use Brown et al.'s data structure into Ehsani' s remote access of web 
page(s) because this would enable the voice based server to control Internet applications because 
speech recognition technology is being used to facilitate communication where the use of input 
modalities is impossible or inconvenient, as taught by Ehsani., page 1 1 paragraph 0200. 

As to Claim 8, which depends on claim 1, Brown et al. teach 

acquiring a second set of data structures that contain a second grammar, a second word 
sub-grammar, a second phone sub-grammar, and a second state sub-grammar, each of the 
second sub-grammars related to the second grammar (Fig. 8, second grammar, node c t>, is related 
to the sub-grammar, Fig. 5 "size") (The data structure for instantiations of HMM are used to 
allocate memory, which will replace grammar by "de-instantiating grammar" that is no longer 
needed. De-instantiating grammar includes sub-grammars because non-terminal tables are used 
to define all "sub-grammars" with they system. And non-terminal tables are an EHMM (de- 
instantiated or Ephemeral HMM) creation table. Instantiated portions of the grammar are de- 
instantiated are replaced by others that are instantiated. Instantiations and de-instantiations are 
done during the speech recognition processing. Column 4, lines 19-24; column 2, lines 23-24; 
column 12, lines 15-16; and column 12, lines 13-14; column 9, lines 58-60; and column 9, lines 
11-16). 

As to Claim 9, Brown in view of Ehsani, disclose all the limitations of claim 
8 upon which claim 9 depends on, Brown further discloses: 
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the second set of data structures replace the first set of data structures (Fig. 8, second 
grammar, node C b, "large medium small" is replaces the sub-grammar "size" in Fig. 5; lines 19- 
24; column 2, lines 23-24) 

A to Claim 10, Brown in view of Ehsani, disclose all the limitations of claim 
8, upon which claim 10 depends on, Brown further discloses: 

the second set of data structures is acquired while the speech recognition system is 
operating (the grammar relates to inputs that have already been received and processed, 
instantiation of HMM, allocating memory space, establishing data structure within space needed 
to process phone scores, col. 2 lines 24-27 and col. 4 lines 14-21; col. 3 lines 24-29; thus the 
establishing of data structures when processing phone scores and grammar processing are 
acquired while the recognizer is operating because the recognizer is necessarily operating thru 
the HMM in order to process the instantiation of HMM in memory space, real time 
instantiation). 

As to claim 1 1, Brown teaches of a speech recognition system, a method for recognizing 
speech comprising the steps of: 

acquiring a first set of data structures that contain a grammar, 
a word sub-grammar, a phone sub-grammar and a state sub-grammar, each of the 
sub-grammars related to the grammar structures (The data structure for instantiations of HMM 
are used to allocate memory, the recognition systems includes "phone, "word" "grammar" and 
"sub-grammars". Column 4, lines 19-24; column 1 1, lines 39-41 and column 3, lines 40-45); 
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acquiring a speech signal (speech input, column 1, lines 26-28); 

performing a probabilistic search using the speech signal as an input, and using the first 
set of data structures as possible inputs (Fig 1); 

allocating memory for one of the sub-grammars when a transition to that sub-grammar is 
made during the probabilistic search (Grammar processor (sub-grammars) causes the word 
probability processor to instantiate (allocate memory), column 8, lines 11-14). 

computing a probability of a match between the speech signal and an element of the sub- 
grammar for which memory has been allocated ("speech input " is compared to "stored acoustic 
features representative of words" (examiner is reading this as 'memory') contained in a selected 
grammar, column 1, lines 26-30"). 

Brown et al. do not explicitly teach implementing a remote computer. 

However, Ehsani teach wherein the first set of data structures is generated by the speech 
recognition system based at least in part in part on a grammar provided by a remote computer 
(Voice telephony server with speech recognition for remote access of databases via voice 
commands (page 11, paragraph 0200). 

It would have been obvious to one of ordinary skill in the art at the time the invention' 
was made to combine Brown et al.'s data structure into Ehsani 's remote appliance/computer 
because this would provide users with flexibility, thus users have the capability to access 
external data bases or control applications or devices, as taught by Ehsani page 1 1 paragraph 
0200-0200. 

As to Claim 12, which depends on claim 11, Brown et al. teach 
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the probabilistic search is a Viterbi beam search ("beam" searching. . ."Viterbi. . .", 
column 1, lines 41-42). 

As to Claim 13, which depends on claim 11, Brown et al. teach 
acquiring a second set of data structures that contain a second grammar, a second word 
sub-grammar, a second phone sub-grammar, and a second state sub-grammar, each of the 
second sub-grammars related to the second grammar (Fig. 8, second grammar, node C b, is related 
to the sub-grammar, Fig. 5 "size") (Column 4, lines 19-24; column 2, lines 23-24; column 12, 
lines 15-16; and column 12, lines 13-14; column 9, lines 58-60; the data structures for 
instantiations of HMM are used to allocate memory, which will replace grammar by "de- 
instantiating grammar" that is no longer needed. De-instantiating grammar includes sub- 
grammars because non-terminal tables are used to define all "sub-grammars" with they system. 
And non-terminal tables are an EHMM (de-instantiated or Ephemeral HMM) creation table. 
Column 4, lines 19-24; column 2, lines 23-24; column 12, lines 15-16; and column 12, lines 13- 
14; column 9, lines 58-60). 

As to Claim 14, which depends on claim 11, Brown et al. teach 

the second set of data structures replace the first set of data structures (Fig. 8, second 

grammar, node C b, "large medium small" is replaces the sub-grammar "size" in Fig. 5; lines 19- 

24; column 2, lines 23-24) 

As to Claim 15, which depends on claim 13, Brown et al. teach 
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the second set of data structures is acquired while the speech recognition system is 
operating (real time instantiating grammar, the grammar relates to inputs that have already been 
received and processed, instantiation of HMM, allocating memory space, establishing data 
structure within space needed to process phone scores, col. 3 lines 24-29; col. 2 lines 24-27 and 
col. 4 lines 14-21; thus the establishing of data structures when processing phone scores and- 
grammar processing are acquired while the recognizer is operating because the recognizer is 
necessarily operating thru the HMM in order to process the instantiation of HMM in memory 
space). 

As to Claim 16, which depends on claim 1 1, Brown et al. do not explicitly teach web 

page. 

However, Ehsani do teach a set of data structures included in code that defines a web 
page and data structures associated with one or more web pages ("voice page(s)" or "codes" ' 
is/are represented by data (data structure) for both structure and content of the Web page, and 
"enables interaction with the Web page using audio input from speech" page 13, paragraph 
(0231). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to use Brown's data structure into Ehsani' s voice access of web page(s) 
because this would provide the user with interaction with Web page using audio input speech or 
tones(s) (Ehsani page 13 paragraph 0230-0231). 

As to Claim 17, which depends on claim 15, Brown et al. teach do not explicitly teach 
one web page. 
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However, Ehsani do teach a set of data structures included in code that defines a web 
page and data structures associated with one or more web pages ("voice page(s)" or "codes" 
is/are represented by data (data structure) for both structure and content of the Web page, and 
"enables interaction with the Web page using audio input from speech" page 13, paragraph 
(0231; data structures are "voice page" instructions for a conventional Web page and includes 
speech application in Fig. 6). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to use Brown et al.'s data structure into Ehsani' s voice access of web 
page(s) because this would enable the user access to "voice pages", thus the user can use their 
voice rather than filling out interactive forms on the Web using a keyboard or mouse, as taught 
by Ehsani, page 13 paragraph 230. 

As to claim 18, Brown teaches a method for recognizing speech comprising 
the steps of: 

acquiring a first set of data structures that contain a top level 
grammar and a plurality sub-grammars, each of the sub-grammars hierarchically 
related to the grammar and to each other (column 3, lines 14-15 and column 8 lines 65-67 and 
column 9, lines 1-4) ; 

acquiring a speech signal (speech input, column 1, lines 14-17); 

performing a probabilistic search using the speech signal as an input, and 
using the first set of data structures as possible inputs (". . . mixture probability 
processor., grammar processor" column 1, lines 39-40); 
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allocating memory for specific sub-grammars when transitions to those specific sub- 
grammars are made during the probabilistic search (Grammar processor ("sub-grammars") 
causes the word probability processor to "instantiate" (allocate memory), column 8, lines 1 1- 
14); and 

computing probabilities of matches between the speech signal and elements of the sub- 
grammars for which memory has been allocated ("speech input " is compared to "stored 
acoustic features representative of words" (examiner is reading this as 'memory') contained in a 
selected grammar, column 1, lines 26-30"). 

Brown et al. do not explicitly teach implementing a remote computer 
However, Ehsani do teach wherein the first set of data structures is generated by the 
speech recognition system based at least in part in part on a grammar provided by a remote 
computer (Voice telephony server with speech recognition for remote access of databases via 
voice commands (page 11, paragraph 0200). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Brown et al.'s data structure into Ehsani' s remote 
appliance/computer because this would provide users with flexibility, thus users have the 
capability to access external data bases or control applications or devices, as taught by Ehsani 
(page 1 1 paragraph 0200 and page 13 paragraph 0230-0231). 

As to Claim 19, which depends on claim 18, Brown et al. teach 
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the top level grammar includes one or more word sub-grammars, the word sub- 
grammars including words that are related according to word-to-word transition probabilities 
("N-tuple grammar", column 11, line 45.) 

As to claim 20, which depends on claim 19, Brown et al. teach 

each word in a word sub-grammar includes one or more phone sub-grammars, the phone 
sub-grammars including phones that are related according to phone-to-phone transition 
probabilities ("Word probability processor 125 contains a) prototypical word models- 
Illustratively Hidden Markov Models (HMMs)--for the various words that the system of FIG. 1 
is capable of recognizing, based on concatenations of phone representations " column 4, lines 
14-17). 

As to claim 21, which depends on claim 20, Brown et al. teach 

each phone in a phone sub-grammar includes one or more state sub-grammars, the state 
sub-grammars including states that are related according to state-to-state transition probabilities 
("Three state. . .phone representation. . . each state. . .phone probability processor generates tri- 
phone probabilities from component", column 10, lines 58-64). 

As to claim 22, which depends on claim 21, Brown et al. teach 

the probabilities of matches between the speech signal and elements of the sub-grammars 
for which memory has been allocated is computed using one or more probability distributions 
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associated with each state ("Hidden Markov Models with multivariate Gaussian distribution" 
column 10, lines 38-41"). 

As to claim 23, which depends on claim 22, Brown et al. teach 

that when a word is allocated in memory, an initial phone for the word and an initial state 
for the initial phone are also allocated in memory ("stores a lexicon of phonetic word spellings 
for the vocabulary words which are keyed on the word index. The Phonetic Lexicon table is 
used to build an internal structure when instantiating an EHMM", column 12, lines 26-27). 

As to claim 24, which depends on claim 23, Brown et al. teach 

one subsequent states are allocated in memory until the end of the phone is reached, the 
allocation based on a transition probability at each state ("Phonetic table. . . are loaded into the 
grammar processor. Column 13, lines 30-31"). 

As to claim 25, Brown in view of Ehsani, which depends on claim 24, Brown et al. teach 
one subsequent phones are allocated in memory until the end of the word is reached, the 
allocation based on a transition probability at each phone (".. .input comprises phone scores that 
were generated by phone probability processor. . , column 5, lines 25-29 and Fig 2"). 

As to claim 26, which depends on claim 21, Brown et al. teach 
when a state probability falls below a state threshold, the state is de-allocated from 
memory. (". . .drop below it, it can be safely assumed that that portion of the network relates to 
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input that has already been received and processed and it is at that point that the model is de- 
instantiated. " column 2, lines 41-43) 

As to claim 27, which depends on claim 26, Brown et al. teach 

the state threshold is dynamically adjustable (HMM requires different instantiation for 

each different appearance of the word in question within the grammar, col. 4 lines 19-26 and 

col. 9 lines 25-45). 

As to claim 28, which depends on claim 21, Brown et al. teach 
that when a phone probability falls below a phone threshold, the phone is de-allocated 
from memory (".. .the HMM are instantiated when needed and de-instantiated when no longer 
needed, called EHMMs" column 9, lines 57-60 and "HMM have first risen above a predefined 
threshold and thereafter all drop below it. . .process. .. de-instantiated, column 2, lines 40-45"). 

As to claim 29, which depends on claim 28, Brown et al. teach 

the phone threshold is dynamically adjustable (col. 10 lines 62-67 and col. 1 1 lines 1-5). 
As to claim 30, which depends on claim 21, Brown et al. teach 

that when a word probability falls below a word threshold, the word is de-allocated from 
memory. (". . . drop below it, it can be safely assumed that that portion of the network relates to 
input that has already been received and processed and it is at that point that the model is de- 
instantiated. " column 2, lines 41-43) 
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As to claim 31, which depends on claim 21, Brown et al. teach 

the word threshold is dynamically adjustable (word probability score processor is 

expanded via the phrase path, col. 4 lines 27-42, 65-67 and col, 1 1 lines 12-17; thus score is 

dynamically adjustable based on the phrase path). 

As to claim 32, which depends on claim 26, Brown et al. teach 

that when all the states associated with a phone are de-allocated from memory, the phone 
is de-allocated from memory ("By de-instantiated we mean that, at a minimum, phone score 
processing and the propagation of hypothesis scores into such portions of the grammar, e.g., a 
particular HMM, column 9, lines 12-15 and grammar comprises of words column 11, lines 39- 
41 and "HMM are instantiated only as needed an de-instantiated when no longer needed, 
column 9, lines 57-60)" 

As to claim 33, which depends on claim 32, Brown et al. teach 

that when all the phones associated with a word are de-allocated from memory, the word 
is de-allocated from memory ("By de-instantiated we mean that, at a minimum, phone score 
processing and the propagation of hypothesis scores into such portions of the grammar, e.g., a 
particular HMM, column 9, lines 12-15 and grammar comprises of words column 11, lines 39- 
41 and "HMM are instantiated only as needed an de-instantiated when no longer needed, 
column 9, lines 57-60)". 
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As to claim 34, Brown teaches of a method for allocating memory in a speech recognition 
system comprising the steps of: 

acquiring a set of data structures that contain a grammar and one 
or more sub-grammars related to the grammar; (grammar processor, non-terminal grammatical 
rules are used to dynamically generate finite-state sub-grammars comprising of word; column 
11, lines 39-40) 

acquiring a speech signal (recognizing speech and other inputs; column 1, lines 14-16); 
performing a probabilistic search using the speech signal as an input, and using the first 
set of data structures as possible inputs (grammar instantiated in response to any particular input 
utterance; column 8, lines 55-56); and 

allocating memory for a selected one or more of the sub-grammars when a transition to 
the selected sub-grammar is made during the probabilistic search (rather, as processing of input 
speech begins, grammar processor causes word probability processor to instantiate; initial 
portion of the grammar, column 8, lines 14-16). 

Brown et al. do not explicitly implementing a remote computer 
However, Ehsani et al. teach wherein the first set of data structures is generated by the 
speech recognition system based at least in part in part on a grammar provided by a remote 
computer (Voice telephony server with speech recognition for remote access of databases via 
voice commands; page 11, paragraph 0200). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Brown et al.'s data structure into Ehsani's remote 
appliance/computer because this would provide users with flexibility, thus users have the 
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capability to access external data bases or control applications or devices, as taught by Ehsani, 
page 1 1 paragraph 0200 and page 13 paragraph 0230-0231. 

As to claim 35, Brown et al. teach in a speech recognition system, a method for 
recognizing speech comprising the steps of: 

(a) acquiring a set of data structures that contain a grammar and one or more sub- 
grammars related to the grammar, (grammatical rules; generate finite-state sub-grammars 
comprising of word; column 11, lines 39-40); (b) receiving spoken input signal (recognizing 
speech and other inputs; column 1, lines 14-16); (c) using one or more of the data structures to 
recognize the spoken input (data structure used to process phone scores, column 4, lines 22-23 
and phone representation is a phonetic model of speech signal; column 4, lines 6-7); (d) while 
the speech recognition system is operating, acquiring a second set of data structures that contain 
a second grammar and one or more sub-grammars related to the second grammar (Fig 14); and 
(e) repeating steps (b) and (c), using the second set of data structures in step (c). (word 
probability processor contains; data structure for instantiation of HMM; column 4, lines 18-23 
and Fig 14). 

Brown et al. do not explicitly implementing a remote computer 

However, Ehsani et al. do teach wherein the first set of data structures is generated by the 
speech recognition system based at least in part in part on a grammar provided by a remote 
computer (Voice telephony server with speech recognition for remote access of databases via 
voice commands; page 11, paragraph 0200). 



21 



Application/Control Number: 09/894,898 Page 22 

Art Unit: 2654 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Brown et al.'s data structure into Ehsani' s remote 
appliance/computer because this would provide users with flexibility, thus users have the 
capability to access external data bases or control applications or devices, as taught by Ehsani, 
page 1 1 paragraph 0200 and page 13 paragraph 0230-0231. 

As to Claim 36, Brown et al. teach of a speech recognition system, a method for 
recognizing speech comprising the steps of: 

(b) receiving spoken input signal (speech input, column 1, and lines 14-17); (c) using 
one or more of the data structures to recognize the spoken input (Data structure is used for 
memory, which comes from the word probability, the word probability is getting it's data from 
spoken input, column 1, 24-28 and column 4, lines 19-24); (d) while the speech recognition 
system is operating, acquiring a second set of data structures from the first remote computer or 
from a second remote computer, the second set of data structures containing a second grammar 
and one or more sub-grammars related to the second grammar (While the speech recognition 
system is operating, the figure shows that it will loop back to the input signal to find the next 
words until it reaches the end of the sentence, Fig 1); and (e) repeating steps (b) and (c), using 
the second set of data structures in step (c). (Fig 1 and Fig 14). 

Brown et al. do not teach (a) acquiring from a first remote computer a set of data 
structures that contain a grammar and one or more sub-grammars related to the 
grammar. 

Ehsani do teach (a) acquiring from a first remote computer a set of data 
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structures that contain a grammar and one or more sub-grammars related to the grammar (Uses 
an application of recognition "grammars" via "remote" voice control (Page 1 1, column 0200); 
grammar such as word, phone, and states are used in data structure. Ehsani describes the 
recognition "grammar", which uses "phonetic" transcription, "word" sequences, and 
probability (states) to process the voice commands; page 1 1 column 0212). 

Therefore, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to combine Brown et al.'s data structure into Ehsani 's remote 
appliance/computer because this would provide users with flexibility, thus users have the 
capability to access external data bases or control applications or devices, as taught by Ehsani, 
page 1 1 paragraph 0200 and page 13 paragraph 0230-0231. 

Conclusion 

5. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office 
action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is 
reminded of the extension of time policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 
CFR 1 .136(a) will be calculated from the mailing date of the advisory action. In no event, 
however, will the statutory period for reply expire later than SIX MONTHS from the date of this 
final action. 
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6. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Myriam Pierre whose telephone number is 703-605-1 196. The 
examiner can normally be reached on Monday - Friday from 5:30 a.m. - 2:00p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
MP 03/17/2006 
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